On the inference of large phylogenies with long branches: How long is too long?

نویسندگان

  • Elchanan Mossel
  • Sébastien Roch
  • Allan Sly
چکیده

The accurate reconstruction of phylogenies from short molecular sequences is an important problem in computational biology. Recent work has highlighted deep connections between sequence-length requirements for high-probability phylogeny reconstruction and the related problem of the estimation of ancestral sequences. In Daskalakis et al. (in Probab. Theory Relat. Fields 2010), building on the work of Mossel (Trans. Am. Math. Soc. 356(6):2379-2404, 2004), a tight sequence-length requirement was obtained for the simple CFN model of substitution, that is, the case of a two-state symmetric rate matrix Q. In particular the required sequence length for high-probability reconstruction was shown to undergo a sharp transition (from O(log n) to poly(n), where n is the number of leaves) at the "critical" branch length g (ML)(Q) (if it exists) of the ancestral reconstruction problem defined roughly as follows: below g (ML)(Q) the sequence at the root can be accurately estimated from sequences at the leaves on deep trees, whereas above g (ML)(Q) information decays exponentially quickly down the tree.Here, we consider a more general evolutionary model, the GTR model, where the q×q rate matrix Q is reversible with q≥2. For this model, recent results of Roch (Preprint, 2009) show that the tree can be accurately reconstructed with sequences of length O(log (n)) when the branch lengths are below g (Lin)(Q), known as the Kesten-Stigum (KS) bound, up to which ancestral sequences can be accurately estimated using simple linear estimators. Although for the CFN model g (ML)(Q)=g (Lin)(Q) (in other words, linear ancestral estimators are in some sense best possible), it is known that for the more general GTR models one has g (ML)(Q)≥g (Lin)(Q) with a strict inequality in many cases. Here, we show that this phenomenon also holds for phylogenetic reconstruction by exhibiting a family of symmetric models Q and a phylogenetic reconstruction algorithm which recovers the tree from O(log n)-length sequences for some branch lengths in the range (g (Lin)(Q),g (ML)(Q)). Second, we prove that phylogenetic reconstruction under GTR models requires a polynomial sequence-length for branch lengths above g (ML)(Q).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Long-term Streamflow Forecasting by Adaptive Neuro-Fuzzy Inference System Using K-fold Cross-validation: (Case Study: Taleghan Basin, Iran)

Streamflow forecasting has an important role in water resource management (e.g. flood control, drought management, reservoir design, etc.). In this paper, the application of Adaptive Neuro Fuzzy Inference System (ANFIS) is used for long-term streamflow forecasting (monthly, seasonal) and moreover, cross-validation method (K-fold) is investigated to evaluate test-training data in the model.Then,...

متن کامل

Aging, Pensions and Long-term Care: What, Why, Who, How?; Comment on “Financing Long-term Care: Lessons From Japan”

Japan has been aging faster than other industrialized nations, and its experience offers useful lessons to others. Japan has been willing to expand its welfare state with a long-term care (LTC) insurance to finance home care and nursing home care for frail elderly. As Ikegami shows, it created new facilities and expanded specialized staffing for home care, developed a c...

متن کامل

Political and Cultural Foundations of Long-term Care Reform; Comment on “Financing Long-term Care: Lessons From Japan”

This paper comments on Naoki Ikegami’s editorial entitled “Financing long-term care: lessons from Japan.” Adding to the editorial, this paper focuses on analyzing the political and cultural foundations of long-term care (LTC) reform. Intergenerational solidarity and inclusive, prudential public deliberation are needed for the establishment or reform of LTC systems. Amon...

متن کامل

Long-term Care Financing: Inserting Politics and Resource Allocation in the Debate; Comment on “Financing Long-term Care: Lessons From Japan”

The ageing of the countries’ populations, and in particular the growing number of the very old, is increasing the need for long-term care (LTC). Not surprisingly, therefore, the financing of LTC systems has become a crucial topic across the Organisation for Economic Co-operation and Development (OECD). In the last three decades, various financing policies have been carr...

متن کامل

2 0 Ja n 20 10 On the inference of large phylogenies with long branches : How long is too long ? ∗

The accurate reconstruction of phylogenies from short molecular sequences is an important problem in computational biology. Recent work has highlighted deep connections between sequence-length requirements for highprobability phylogeny reconstruction and the related problem of the estimation of ancestral sequences. In [Daskalakis et al.’09], building on the work of [Mossel’04], a tight sequence...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bulletin of mathematical biology

دوره 73 7  شماره 

صفحات  -

تاریخ انتشار 2011